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Abstract 

A genetic association study is a complicated process that involves collecting phenotypic data, generating genotypic 
data, analyzing associations between genotypic and phenotypic data, and interpreting genetic biomarkers 
identified. SNPTrack is an integrated bioinformatics system developed by the US Food and Drug Administration 
(FDA) to support the review and analysis of pharmacogenetics data resulting from FDA research or submitted by 
sponsors. The system integrates data management, analysis, and interpretation in a single platform for genetic 
association studies. Specifically, it stores genotyping data and single-nucleotide polymorphism (SNP) annotations 
along with study design data in an Oracle database. It also integrates popular genetic analysis tools, such as PLINK 
and Haploview. SNPTrack provides genetic analysis capabilities and captures analysis results in its database as SNP 
lists that can be cross-linked for biological interpretation to gene/protein annotations, Gene Ontology, and pathway 
analysis data. With SNPTrack, users can do the entire stream of bioinformatics jobs for genetic association studies. 
SNPTrack is freely available to the public at http://www.fda.gov/ScienceResearch/BioinformaticsTools/SNPTrack/ 
default.htm. 



Introduction 

Personalized medicine will improve health outcomes and 
patient satisfaction. However, implementing personalized 
medicine based on individuals' biological information relies 
on genetic biomarkers that are identified through genetic 
association studies. High-throughput genotyping technolo- 
gies have been advanced to enable the simultaneous deter- 
mination of genotypes for millions of single-nucleotide 
polymorphisms (SNPs). Concurrently, the International 
HapMap Project determined genotypes of over 3.1 million 
common SNPs in human populations [1]. These advances 
combine to make genetic association studies a feasible and 
promising research field for personalized medicine. How- 
ever, there are a number of bioinformatics challenges asso- 
ciated with the enormous amount of genetic data generated 
by high-throughput technologies. Storing and accessing the 
data, performing association tests, and interpreting results 
can no longer be readily done using ad hoc approaches 
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commonly utilized for much smaller candidate gene associ- 
ation studies. Furthermore, because contributions of indi- 
vidual polymorphisms to a phenotype are typically quite 
small, appropriate analysis and interpretation techniques 
are key. Thus, identifying all associated polymorphisms and 
placing them in context is a necessary step in understand- 
ing their role in defining the phenotype or treatment 
response. 

A number of bioinformatics algorithms and tools have 
been developed for managing and analyzing genetic data 
as well as for interpreting genetic biomarkers. However, 
none of them have been able to do all of the bioinfor- 
matics jobs needed for a complete genetic association 
study; scientists have needed to use more than one tool 
for their studies. Therefore, there was high demand for 
an integrated bioinformatics system. 

Early in the Voluntary exploratory Data Submission 
program [2], the FDA's National Center for Toxicological 
Research developed ArrayTrack™ to manage, analyze, 
and interpret microarray gene expression data [3,4]. 
ArrayTrack™ has since been used for reviewing and ana- 
lyzing genomic data at the FDA and for genomic research 
in the scientific community. Building on the success and 
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experience from ArrayTrack™, SNPTrack was developed 
as a one-stop-shop bioinformatics solution capable of per- 
forming the same function for genetic data that Array- 
Track™ does for gene expression data. SNPTrack offers a 
full suite of data storage and management, analysis, and in- 
terpretation tools for genetic association studies. 

Implementation 

SNPTrack adopts a client-server system that integrates 
data management, analysis, and interpretation into a sin- 
gle system. The Oracle server stores and integrates 
phenotypic and genotypic data as well as annotations of 
genetic biomarkers from public resources about SNPs, 
quantitative trait loci (QTLs), genes, proteins, and path- 
ways. Its user interface, query mechanism, and data 
visualization features were implemented in Java. As 
depicted in Figure 1, SNPTrack has three major compo- 
nents: StudyDB, TOOL, and LIB. 

StudyDB hosts and manages genotypic and phenotypic 
data. It supports importing of three types of files in tab- 
delimited text format: annotation files for the genotyped 
SNPs (which is compiled for the study or provided by 
the chip provider), genotype data files, and phenotype 
data files (which may include sex, age, race, disease sta- 
tus, and drug information such as environmental expos- 
ure, dose, treatment response, and adverse events). Data 
are organized and presented in a tree- structured view of 
three node types: study owner or group (username), 
study title, and study data. 

The TOOL component provides the data analysis fea- 
tures. Data are formatted and exported to the client 
computer for analysis with PLINK, a command-line pro- 
gram that features many statistical methods such as 



case-control associations, various regression methods, 
permutation tests, false discovery rate, and other algo- 
rithms [5]. Analysis commands in PLINK are issued and 
managed through gPLINK, a Java-based graphical user 
interface for PLINK commands management [6]. Ana- 
lysis results can be visualized through Haploview [7]. 
Linkage disequilibrium and haplotypes in the region 
around an interesting SNP can be downloaded from 
HapMap and viewed in Haploview. These component 
tools are automatically loaded to the client computer 
and updated by SNPTrack. Interesting SNPs can also be 
saved into StudyDB. As needed, other stand-alone ana- 
lysis tools such as SAS and R/Bioconductor can be inte- 
grated in the TOOL. 

The LIB contains a collection of libraries to facilitate 
the interpretation of results from genetic studies. The li- 
braries partially mirror the contents of dbSNP, GenBank, 
SWISS -PROT, LocusLink, Kyoto Encyclopedia of Genes 
and Genomes, Gene Ontology (GO), and others. The 
annotations from these databases are extracted to con- 
struct the enriched libraries, such as the SNPLib, GeneLib, 
ProteinLib, and PathwayLib. The SNP and QTL libraries 
are specifically designed for genetic association studies [8]. 
The libraries are cross-linked and support functions such 
as list-based queries to provide a mechanism for data in- 
terpretation. The SNP Library follows the release cycles of 
dbSNP and is updated about twice a year. 

A typical workflow begins with importing the SNP panel, 
genotype, and phenotype data files into SNPTrack. Access 
permission (data security) is controlled by the user. Signifi- 
cantly associated SNPs can be identified using PLINK. 
Some commonly used operations include filtering SNPs 
using the Hardy- Weinberg test for linkage disequilibrium, 



TOOL 





LIB 



Database library Help 



























Figure 1. SNPTrack's graphical user interface with the connections of its major components: StudyDB, TOOL and LIB. 



Xu et a I. Human Genomics 2012, 6:5 
http://www.humgenomics.eom/content/6/1/5 



Page 3 of 3 



followed by an allele frequency summary, allelic association 
tests, genotypic association tests, and/or linear/logistic re- 
gression analysis. Significantly associated SNPs found by 
the analysis tools can be saved as a SNP list in SNPTrack. 
Users can also import, export, edit, manage, and compare 
SNP lists. Specific interesting SNPs can be directly linked 
to a wide selection of external databases (dbSNP Report, 
Ensembl, Hapmap, etc.) for more detailed information. 
Integrated libraries allow users to find genes and pathways 
related to SNPs. 

Availability 

The SNPTrack client application works on all major 
operating systems including Windows, Linux, and 
Mac. An instance of the SNPTrack server is hosted 
by the FDA and freely available at http://www.fda.gov/ 
ScienceResearch/BioinformaticsTools/SNPTrack/default. 
htm. Users may also request the software for a local instal- 
lation. Manuals and sample data are available at the above 
website. 
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Conclusions 

SNPTrack is a one-stop-shop system for managing, ana- 
lyzing, and interpreting genetic association data. It pro- 
vides a centralized storage solution that can perform 
complicated genetic association analyses on a large num- 
ber of SNPs for identification of genetic biomarkers, and 
find related genes, pathways, and GO terms. SNPTrack 
is used not only for review and analysis of genetic data 
by the FDA, but is also freely available to the public. 
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